Skip to content

add linux_aarch64#426

Closed
dslarm wants to merge 82 commits into
conda-forge:mainfrom
dslarm:linux-aarch64
Closed

add linux_aarch64#426
dslarm wants to merge 82 commits into
conda-forge:mainfrom
dslarm:linux-aarch64

Conversation

@dslarm
Copy link
Copy Markdown
Contributor

@dslarm dslarm commented Mar 20, 2025

  • Making a start on the linux-aarch64 support - just updating the config/CI initially

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Mar 20, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13979950481. Examine the logs at this URL for more detail.

Comment thread conda-forge.yml
@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Mar 20, 2025

@conda-forge-admin please rerender

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Mar 20, 2025

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • ❌ Selectors are suggested to take a <two spaces>#<one space>[<expression>] form. See lines [60]

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13979961254. Examine the logs at this URL for more detail.

@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Mar 20, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/14972143517. Examine the logs at this URL for more detail.

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Mar 20, 2025

@conda-forge-admin please rerender

@conda-forge-admin
Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13980089645. Examine the logs at this URL for more detail.

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Mar 20, 2025

===== stdout start =====
Collecting h5py==3.11.0 (from -r /tmp/tmpqy0hn6u_ (line 1))
  Downloading h5py-3.11.0.tar.gz (406 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 406.5/406.5 kB 677.1 kB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
===== stdout end =====
===== stderr start =====
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [3 lines of output]
      Ignoring oldest-supported-numpy: markers 'python_version == "3.8"' don't match your environment
      ERROR: Could not find a version that satisfies the requirement Cython<4,>=0.29.31 (from versions: none)
      ERROR: No matching distribution found for Cython<4,>=0.29.31
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

however .. according to the logs: cyhon is: cython: 3.0.12-py311hc8540bd_0 conda-forge

@dslarm dslarm marked this pull request as ready for review May 14, 2025 09:01
Comment thread recipe/build.sh
@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented May 23, 2025

There is now support for Arm github native runners in conda-smithy:
conda-forge/conda-smithy#2322
does this change next steps and our config here?

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented May 28, 2025

@hmaarrfk - could I get your input on this one please? we must be close to done?

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jun 5, 2025

@conda-forge/core - could someone merge this for me, or tell me what they see as missing? AFAICT, it's been ready for about 3-4 weeks now.

@isuruf
Copy link
Copy Markdown
Member

isuruf commented Jun 5, 2025

@dslarm we'll let @hmaarrfk handle this one. Let's give him some time.

I have a question though. Since tensorflow is a pretty hairy package to build, linux-aarch64 will need some love from someone. Would you be willing to watch this repo and help out if any linux-aarch64 issues comes along?

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jun 5, 2025

@isuruf - yes, I'll watch out for aarch64. Now we're going down the native compilation path, it'll behave itself much better!

@hmaarrfk
Copy link
Copy Markdown
Contributor

hmaarrfk commented Jun 8, 2025

The problem with this feedstock is that the overlap between maintainers and "conda-forge core" members is VERY high.

I'm just not super excited about moving to Cuda 12.8 and making this feedstock "more unique".

I would prefer we keep the defaults and work together to get things moving forward when we start the Cuda 12.8 migration.

I want to:

  1. Get this PR drop CUDA 11.8 conda-forge-pinning-feedstock#7431 merged through
  2. Start the 12.8 migration conda-forge wide, even for x86

This unfortunately means waiting longer to get this into conda-forge.

In the mean time, you can:

  1. start bulding these for yourself. (keep as 2 commits, new additions, then rerender)
  2. start to look into what it would take to update to the next versions of tensorflow 2.19

don't take this the wrong way, this recipe is SUPER hard to maintain, so "adding" to coordination and maintenance is a "negative" unless..... you show that you can help with the "general maintenance" of keeping tensorflow updated with our ecosystem....

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jun 8, 2025

To get this one through, we could drop the CUDA on linux-aarch64 - by adding a version check for cuda < 12.8, then when cuda >= 12.8 is used, it'll start to build.

We can then drop the migration file.

Ideally we could get 2.18 out of the door before 2.19: the tendency for bioconda's users to pin to older versions means stumbling over "but version X isn't supported" for some usually pretty minor version bump.. so the earlier versions are appreciated.

@hmaarrfk
Copy link
Copy Markdown
Contributor

To get this one through, we could drop the CUDA on linux-aarch64 - by adding a version check for cuda < 12.8, then when cuda >= 12.8 is used, it'll start to build.

How does this help you?

You could just "clean up the commits here" (i think you have the git skills to do a rebase) and then it will be ready when it is time. No need for gymnastics on the main branch.

I'm not a fan of merging "code that isn't run".

Ideally we could get 2.18 out of the door before 2.19

We could make a migration branch for 2.18.

I really just want the conda-forge wide migration for CUDA 12.8 to start before we add it to the hardest recipe to maintain.

Do some cleanups to the changes (consolidate into 1 commit) and try to help "upgrade" the recipe for all architectures to CUDA 12.8.

@hmaarrfk
Copy link
Copy Markdown
Contributor

@h-vetinari
Copy link
Copy Markdown
Member

XREF: conda-forge/conda-forge-pinning-feedstock#7005

Here's a worked-out example for using the CUDA 12.9 migration: conda-forge/pytorch-cpu-feedstock#393

It currently requires a development version of conda-smithy, so unless you're comfortable installing that from my branch and then rerendering locally, I recommend to wait a little longer. If you want I can also update the PR for you.

@hmaarrfk
Copy link
Copy Markdown
Contributor

It currently requires a development version of conda-smithy, so unless you're comfortable installing that from my branch

Thank you @h-vetinari.

I think the commits went through.


To recap:

I'm not interested in rushing aarch64 any features for the tensorflow package. I understand that there is an immediate need for this, but unless we can get help "unclogging" some parts of the conda-forge ecosystem, I'm really not interested in expanding the support burden.

One aspect would be to get the builds to be "green again" (even by pinning to older dependencies)
The other would be to update to cuda 12.8 or 12.9 for all platforms (thank you h-vetinari)

I understand that this bar might not be "what you signed up for" but the ability to upload to your own channel always exists as a stop gap.

Other maintainers here might have different opinions. If you cleanup the "81 commits" into a few, that might make them able to review your changes.

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jun 14, 2025

Other maintainers here might have different opinions. If you cleanup the "81 commits" into a few, that might make them able to review your changes.

I'm happy to close this PR and open a new one with the handful of relevant commits - there are a lot of unused/reverted/smithy etc of tried and failed.. I figure a new PR this is the easiest way for a rebase/squash.

@hmaarrfk
Copy link
Copy Markdown
Contributor

Other maintainers here might have different opinions. If you cleanup the "81 commits" into a few, that might make them able to review your changes.

I'm happy to close this PR and open a new one with the handful of relevant commits - there are a lot of unused/reverted/smithy etc of tried and failed.. I figure a new PR this is the easiest way for a rebase/squash.

Yes. That helps too!!

@h-vetinari
Copy link
Copy Markdown
Member

The smithy changes now landed, so if you copy the 12.9 migrator (not yet merged, but hopefully soon; until that happens you need to add use_local: true like in the pytorch PR I referenced), you should be able to rerender correctly without further hassle.

Also, +1 to a rebase, though I'd suggest to force-push into this branch, to keep the context/discussion of this PR (but I leave it up to you if you prefer starting with a clean slate).

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jun 27, 2025

@conda-forge-admin please rerender

(making a start on rebase)

@conda-forge-admin
Copy link
Copy Markdown
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jul 22, 2025

Hi again - an update from my side. I've been unable to get the focus time for this PR recently, but am now able to.

Latest is that the gcc-14 migration is causing compiler errors, due to being more (perhaps rightfully in this case) picky. We're not the first to encounter compilation errors for aarch64 NEON in XNNPACK (google/XNNPACK#7726)
It's fixed by: google/XNNPACK@3bc2a32

It doesn't apply cleanly to the specific cut of XNNPACK that tensorflow 2.18 is using. I have more digging to do to propose a fix for this.

@h-vetinari
Copy link
Copy Markdown
Member

It doesn't apply cleanly to the specific cut of XNNPACK that tensorflow 2.18 is using. I have more digging to do to propose a fix for this.

What we're doing in pytorch and jaxlib is simply to stay on GCC 13. But thanks a lot for digging up that reference!

This makes me a bit sentimental that conda-forge/staged-recipes#19103 resp. conda-forge/staged-recipes#15865 never made it. It would be nice to have one xnnpack that we use in all the ml frameworks. Even if that might not work due to API changes and conflicting versions.... Alas 😑

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jul 31, 2025

bumping up to cuda 12.9 gives a couple of errors, I'm looking into - is on x86 or aarch64..

external/local_xla/xla/service/gpu/gpu_prim.h(41): error: no instance of overloaded function "cub::ThreadStoreVolatilePtr" matches the specified type
  __attribute__((device)) __inline__ __attribute__((always_inline)) void ThreadStoreVolatilePtr<Eigen::half>(
                                                                         ^

external/local_xla/xla/service/gpu/gpu_prim.h(54): error: no instance of overloaded function "cub::ThreadStoreVolatilePtr" matches the specified type
  __attribute__((device)) __inline__ __attribute__((always_inline)) void ThreadStoreVolatilePtr<tsl::bfloat16>(
                                                                         ^

2 errors detected in the compilation of "external/local_xla/xla/service/gpu/cub_sort_kernel.cu.cc".

@h-vetinari
Copy link
Copy Markdown
Member

Note that the main branch currently also fails with 12.9, see #433. I'd stay on 12.6 until that's sorted out

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Jul 31, 2025

Note that the main branch currently also fails with 12.9, see #433. I'd stay on 12.6 until that's sorted out

Thanks - I've got a patch for the CUDA errors with 12.9 - which now gets a lot further on x86_64 (23,000 in the progress of compiles..) with the pinnings for things like abseil/protobuf as they currently stand in this branch. But not aarch64, as it's failing on intrinsics definitions in gcc 13 (or so it appears, or is that the nvcc bug that we had before...) but won't compile with gcc 14 anyway... That patch is here - in case it helps future: 0032-gpu_prim-error.txt - it extends a patch I needed for cuda 12.8.

So, it's back to cuda 12.8 for me - the first/only that I can get to build with aarch64..

(FWIW - the missing intrinsics error, a snippet of..

/home/conda/feedstock_root/build_artifacts/tensorflow-split_1753954693523/_build_env/lib/gcc/aarch64-conda-linux-gnu/13.4.0/include/arm_neon.h(6176): error: identifier "__builtin_aarch64_st3_lanev16qi" is undefined
    __builtin_aarch64_st3_lanev16qi ((__builtin_aarch64_simd_qi *) __ptr, __val,
    ^

/home/conda/feedstock_root/build_artifacts/tensorflow-split_1753954693523/_build_env/lib/gcc/aarch64-conda-linux-gnu/13.4.0/include/arm_neon.h(6182): error: identifier "int16x8x3_t" is undefined
  vst3q_lane_s16 (int16_t *__ptr, int16x8x3_t __val, const int __lane)
                                  ^

@h-vetinari
Copy link
Copy Markdown
Member

I mean, if you get it to build with 12.8 that's fair game! (On the pytorch feedstock I needed to downgrade 12.9 -> 12.8 on windows due to build issues, for example)

@dslarm
Copy link
Copy Markdown
Contributor Author

dslarm commented Aug 1, 2025

replaced by #438

@dslarm dslarm closed this Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants